Search CORE

54 research outputs found

Pure Exploration with Multiple Correct Answers

Author: Degenne Rémy
Koolen Wouter M.
Publication venue
Publication date: 01/01/2019
Field of study

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound

arXiv.org e-Print Archive

CWI's Institutional Repository

Second-order Quantile Methods for Experts and Combinatorial Games

Author: Koolen Wouter M.
van Erven Tim
Publication venue
Publication date: 01/01/2015
Field of study

We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision tasks. We are not satisfied with just guaranteeing minimax regret rates, but we want our algorithms to perform significantly better on easy data. Two popular ways to formalize such adaptivity are second-order regret bounds and quantile bounds. The underlying notions of 'easy data', which may be paraphrased as "the learning problem has small variance" and "multiple decisions are useful", are synergetic. But even though there are sophisticated algorithms that exploit one of the two, no existing algorithm is able to adapt to both. In this paper we outline a new method for obtaining such adaptive algorithms, based on a potential function that aggregates a range of learning rates (which are essential tuning parameters). By choosing the right prior we construct efficient algorithms and show that they reap both benefits by proving the first bounds that are both second-order and incorporate quantiles

arXiv.org e-Print Archive

CWI's Institutional Repository

Queensland University of Technology ePrints Archive

Universal Codes from Switching Strategies

Author: de Rooij Steven
Koolen Wouter M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We discuss algorithms for combining sequential prediction strategies, a task which can be viewed as a natural generalisation of the concept of universal coding. We describe a graphical language based on Hidden Markov Models for defining prediction strategies, and we provide both existing and new models as examples. The models include efficient, parameterless models for switching between the input strategies over time, including a model for the case where switches tend to occur in clusters, and finally a new model for the scenario where the prediction strategies have a known relationship, and where jumps are typically between strongly related ones. This last model is relevant for coding time series data where parameter drift is expected. As theoretical ontributions we introduce an interpolation construction that is useful in the development and analysis of new algorithms, and we establish a new sophisticated lemma for analysing the individual sequence regret of parameterised models

arXiv.org e-Print Archive

CWI's Institutional Repository

Queensland University of Technology ePrints Archive

Online Isotonic Regression

Author: Koolen Wouter M.
Kotłowski Wojciech
Malek Alan
Publication venue
Publication date: 06/06/2016
Field of study

We consider the online version of the isotonic regression problem. Given a set of linearly ordered points (e.g., on the real line), the learner must predict labels sequentially at adversarially chosen positions and is evaluated by her total squared loss compared against the best isotonic (non-decreasing) function in hindsight. We survey several standard online learning algorithms and show that none of them achieve the optimal regret exponent; in fact, most of them (including Online Gradient Descent, Follow the Leader and Exponential Weights) incur linear regret. We then prove that the Exponential Weights algorithm played over a covering net of isotonic functions has a regret bounded by

O\big(T^{1/3} \log^{2/3}(T)\big)

and present a matching

\Omega(T^{1/3})

lower bound on regret. We provide a computationally efficient version of this algorithm. We also analyze the noise-free case, in which the revealed labels are isotonic, and show that the bound can be improved to

O(\log T)

or even to

O(1)

(when the labels are revealed in isotonic order). Finally, we extend the analysis beyond squared loss and give bounds for entropic loss and absolute loss.Comment: 25 page

arXiv.org e-Print Archive

CWI's Institutional Repository

Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

Author: Koolen Wouter M.
Mhammedi Zakaria
van Erven Tim
Publication venue
Publication date: 30/05/2019
Field of study

We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A recent technique promises to overcome this difficulty by maintaining multiple learning rates in parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization and the Squint algorithm for prediction with expert advice. However, in both cases the user still has to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set too small, the methods may fail completely; but if it is taken too large, performance deteriorates significantly. In the present work we remove this Lipschitz hyperparameter by designing new versions of MetaGrad and Squint that adapt to its optimal value automatically. We achieve this by dynamically updating the set of active learning rates. For MetaGrad, we further improve the computational efficiency of handling constraints on the domain of prediction, and we remove the need to specify the number of rounds in advance.Comment: 22 pages. To appear in COLT 201

arXiv.org e-Print Archive

CWI's Institutional Repository

Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning

Author: Grünwald Peter
Koolen Wouter M.
van Erven Tim
Publication venue
Publication date: 20/05/2016
Field of study

We consider online learning algorithms that guarantee worst-case regret rates in adversarial environments (so they can be deployed safely and will perform robustly), yet adapt optimally to favorable stochastic environments (so they will perform well in a variety of settings of practical importance). We quantify the friendliness of stochastic environments by means of the well-known Bernstein (a.k.a. generalized Tsybakov margin) condition. For two recent algorithms (Squint for the Hedge setting and MetaGrad for online convex optimization) we show that the particular form of their data-dependent individual-sequence regret guarantees implies that they adapt automatically to the Bernstein parameters of the stochastic environment. We prove that these algorithms attain fast rates in their respective settings both in expectation and with high probability

arXiv.org e-Print Archive

CWI's Institutional Repository

Lipschitz and Comparator-Norm Adaptivity in Online Learning

Author: Koolen Wouter M.
Mhammedi Zakaria
Publication venue
Publication date: 27/02/2020
Field of study

We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using

O(d)

time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time

O(d^3)

per round. We then generalize two prior reductions to the unbounded setting; one to not need hints, and a second to deal with the range ratio problem (which already arises in prior work). We discuss their optimality in light of prior and new lower bounds. We apply our methods to obtain sharper regret bounds for scale-invariant online prediction with linear models.Comment: 30 Pages, 1 Figur

arXiv.org e-Print Archive

CWI's Institutional Repository

Kolmogorov Complexity Theory over the Reals

Author: Koolen Wouter M.
Ziegler Martin
Publication venue
Publication date: 01/01/2008
Field of study

Kolmogorov Complexity constitutes an integral part of computability theory, information theory, and computational complexity theory -- in the discrete setting of bits and Turing machines. Over real numbers, on the other hand, the BSS-machine (aka real-RAM) has been established as a major model of computation. This real realm has turned out to exhibit natural counterparts to many notions and results in classical complexity and recursion theory; although usually with considerably different proofs. The present work investigates similarities and differences between discrete and real Kolmogorov Complexity as introduced by Montana and Pardo (1998)

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Adaptive Hedge

Author: de Rooij Steven
Grünwald Peter
Koolen Wouter M.
van Erven Tim
Publication venue
Publication date: 01/01/2011
Field of study

Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods.Comment: This is the full version of the paper with the same name that will appear in Advances in Neural Information Processing Systems 24 (NIPS 2011), 2012. The two papers are identical, except that this version contains an extra section of Additional Materia

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository